The data explored in this report comes from College Scorecard. College Scorecard is a product of the U.S. Department of Education and contains college statistics from 1996 to 2014, though this particular analysis will only be looking at four-year universities from the 2014 data.
## STATE FUNDING_TYPE REGION LATITUDE
## CA : 189 Public : 554 5 :449 Min. :13.43
## NY : 164 Private Non-Profit:1213 2 :372 1st Qu.:34.28
## PA : 116 Private For-Profit: 257 3 :301 Median :39.48
## TX : 100 8 :274 Mean :37.95
## IL : 83 4 :193 3rd Qu.:41.79
## FL : 82 6 :166 Max. :64.86
## (Other):1290 (Other):269
## LONGITUDE ADM_RATE_ALL TUITIONFEE_IN TUITIONFEE_OUT
## Min. :-157.89 Min. :0.0000 Min. : 2019 Min. : 2475
## 1st Qu.: -96.65 1st Qu.:0.5491 1st Qu.: 8633 1st Qu.:14682
## Median : -85.18 Median :0.6871 Median :15024 Median :20868
## Mean : -89.39 Mean :0.6663 Mean :19222 Mean :22650
## 3rd Qu.: -77.00 3rd Qu.:0.7935 3rd Qu.:28478 3rd Qu.:29526
## Max. : 144.80 Max. :1.0000 Max. :51008 Max. :51008
## NA's :623 NA's :377 NA's :377
## UNDERGRAD_ENROLL RETENTION GRAD_DEBT_MDN WDRAW_DEBT_MDN
## Min. : 0.0 Min. :0.0000 Min. : 2100 Min. : 2113
## 1st Qu.: 820.5 1st Qu.:0.6647 1st Qu.:21000 1st Qu.: 8125
## Median : 2052.0 Median :0.7546 Median :24750 Median : 9500
## Mean : 4952.4 Mean :0.7274 Mean :23781 Mean : 9842
## 3rd Qu.: 5652.5 3rd Qu.:0.8347 3rd Qu.:27000 3rd Qu.:11118
## Max. :52280.0 Max. :1.0000 Max. :49750 Max. :30250
## NA's :281 NA's :436 NA's :356 NA's :357
## COMPLETION_FIVE_YRS
## Min. :0.0000
## 1st Qu.:0.3689
## Median :0.5026
## Mean :0.5050
## 3rd Qu.:0.6478
## Max. :1.0000
## NA's :430
There are roughly 2000 points of data, each of which was constrained to fifteen variables (name and ID number were excluded from the summary).
The dataset shows that there are approximately four times as many universities east of the -100 longitude line than there are west of the line. One interesting note is that the heat map of university placement closely matches NASA light pollution images, implying a correlation of universities to urban areas.
The dataset helpfully breaks up location information into nine regions. Excluding the ninth region (which covers U.S. Territories like Puerto Rico), the fewest universities are found in region seven. A more visual representation of this data can be found in the Bivariate Plots section.
California is the leader in regards to universities per state. New York follows a reasonable distance behind. Puerto Rico manages to surpass several states, while many of the other territories lack even a single university.
## [1] "In-State Tuition Summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2019 8633 15020 19220 28480 51010 377
## [1] "Out-of-State Tuition Summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2475 14680 20870 22650 29530 51010 377
The median In-State Tuition is slightly above 15,000. The median Out-of-State Tuition is close to 21,000.
## [1] "Graduate Debt Summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2100 21000 24750 23780 27000 49750 356
## [1] "Dropout Debt Summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2113 8125 9500 9842 11120 30250 357
Predictably, graduating students have significantly higher median debt levels than those who withdraw from university. Also of note is how much more tightly clustered and consistent dropout debt is versus graduate debt.
## [1] "Undergraduate Enrollment Summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 820.5 2052.0 4952.0 5652.0 52280.0 281
Enrollment varies wildly, which is to expected given the numerous sizes of universities. One notable outlier in the data is the University of Phoenix, which is the only university to report a six figure enrollment (151,600). This data point has been excluded for most of the analysis in this report.
## [1] "Undergraduate Five-Year Completion Summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.3689 0.5026 0.5050 0.6478 1.0000 430
Graduation rates follow a surprisingly normal distribution. There are a worrisome number of universities (~50) whose completion rates fall below 0.1. These values are distinct from NA, so it would seem that they were deliberatly reported as such. Of note is their funding types (rarely public) and admission rates (generally an NA value).
## [1] "Admission Rates Summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.5491 0.6871 0.6663 0.7935 1.0000 623
Admission rates trend towards accepting more often than rejecting, but I imagine that this varies by other conditions (tution, university funding type, etc.)
## [1] "Retention Rates Summary"
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.6647 0.7546 0.7274 0.8347 1.0000 436
There is a concerning number of 0.0 and 0.01 retention rates. As before though, the data draws a distinction between 0.0 and NA, so these were likely reported as such. Again, these points are similar in that they’re not public universities and rarely have a listed admission rate.
Private Non-Profit schools make up a decided majority of the data points.
There are roughly 2000 points of data, each of which was constrained to fifteen variables. The variables themselves can be grouped into four categories:
The primary feature of interest is the completion rate, which offers a quantitative view of a university’s effectiveness. A low completion rate might imply exclusivity, but it also translates into students with debt and nothing to show for it.
I anticipate that the funding type (public, private) and size (admission rate) will be strong indicators. Other qualities, such as location and retention, may also shed some light on the situation.
No, though it may be handy to have some form of “success quotient” relating factors such as completion rate and low tuition.
University enrollment was adjusted for easier viewing, as there is a significant spread in the amounts of students admitted.
Five year completion rates had an unusually normal distribution.
The ninth region (U.S. territories) and Alaska were excluded from the longitude/latitude scatter plot. This was done for easier viewing.
The University of Phoenix was removed from the dataset, as its enrollment was more than 70 times the median. It’s many sub-schools were kept.
Other than in-state and out-of-state tuition, there are no obvious correlations. There are a few promising leads though, including tuition/completion rate and retention/completion rate.
Adding color to the geographic maps helps to better demonstrate the shape of the regions.
There’s considerable variation between regions with regards to completion rates. The Northeast regions (1, 2, 3) and West Coast region (8) have a median above 0.5. The Northwest (7), Central (4), and South (5, 6) regions fall below 0.5. U.S. Territories (9) are particularly afflicted, with a median completion rate near 0.3.
The funding type of school is shown to be another notable indicator of completion rate. Private For-Profit universities fall far below Public universities in completion rate. Private Non-Profit universities have a noticable, if not exceptional, advantage over Public schools.
## [1] "Correlation: In-State Tuition and Completion Rate"
## [1] 0.5099584
## [1] "Correlation: Out-of-State Tuition and Completion Rate"
## [1] 0.651668
Out-of-state tuition appears to be a decent indicator for completion rate. In-state tuition follows a similar pattern, but clustering (likely from tuition subsidies) ruins the trend.
Binning out-of-state tuition into a box-plot makes the trend a little easier to follow.
## [1] "Correlation: Retention and Completion Rate"
## [1] 0.6577093
Retention is related to completion rates. This makes sense, since students not retained cannot graduate (though students who are retained may take longer than five years to graduate).
## [1] "Correlation: Admission Rate and Completion Rate"
## [1] -0.2769716
## [1] "Correlation: Enrollment and Completion Rate"
## [1] 0.189623
Neither admission rate or undergraduate enrollment seem to be related to completion rate.
The median debt of graduating students does not appear to have any discernable relationship to completion rate.
Out-of-state tuition, student retention, the funding type of university, and the region of the university all seem to have some relationship to completion rates. The size of the university and its exclusivity (admission rate) have less bearing. Likewise, the median debt of a graduate seem unrelated to completion rate.
In-state tuition and out-of-state tuition are correlated, which is to be expected, but the shape of their scatter plots offers some interesting insight into how tuition rates are set. There’s a strong clustering around the $7,000 mark and a break before the $20,000 mark. Following $20,000, in-state tuition looks nearly identical to out-of-state tuition. This shape implies that universities drawinga distinction between in-state and out-of-state tuition have rates that are less than $20,000.
Median graduate debt also exhibits a strange patterning. Hard lines exist at the $25,000 and $27,000 values, utterly independent of completion rate. This suggests a “standard value” of sorts that universities and financial aid packages aim for.
The funding type of university has a noticable relationship to its completion rate. For-profit universities have, on average, lower completion rates than either public or non-profit universities.
Adding funding type and region to the plot outlines an interesting divide between universities. For-Profit universities make up the lowest completion rates and tuitions. Non-Profit universities comprise the majority of high-tuition, high-completion rate schools. Public universities generally fall in-between.
Each region seems to experience this discrepency in a different way. Region 9 (U.S. Territories) has a surprising number of non-profit universities. Region 8 (West Coast) seems to go against the trend with an odd patterning of public and non-profit schools. Region 5 (Southeast) has so many universities that it’s hard to see a trend.
Categorizing funding types by color gives this plot a new story to tell. Public universities manifest the most consistent relationship while non-profits are more erratic. For-profit universities, confusingly, show no pattern. That would imply that there is no connection between the number of students that graduate in five years and the number of students that choose to stay each year.
This plot had previously been an enigma. Looking through the lens of funding type, however, shows useful patterns. Public and non-profit universities have a fairly similar structure with graduate debt. Non-profits in particular show hard lines at 25,000 and 27,000.
For-profit universities show the highest amount of debt. I find this surprising, given it was previously established that they have lower tuition rates on average.
Tying together out-of-state tuition, university funding type, and region proved a very effective means for understanding completion rates. Independently they have weak relationships to one another, but when drawn together, they show some remarkable patterns.
That for-profit schools have some of the lowest tuitions but highest graduate debts was unexpected (and worrisome). I would be curious to discover why this is the case. Perhaps there are financial aid restrictions in play? Or is it the demographics?
For-profit schools seem strange on the whole. In graduate debt and retention, they follow completely different patterns than other universities. Is this because of differences in student demographic? Or something else entirely?
N/A
This plot simulataneously outlines the relative proportions of universities in the U.S. while providing context for the “Region” variable that’s referenced throughout. Readers can quickly deduce that the majority of universities exist in the Eastern U.S. They can also see the vivid outline of the West Coast and blatant sparsity of the Midwest.
There’s a lot going on in this map, but it demonstrates some crucial relationships. First, it shows the relationship between university completion rate and tuition. Some regions demonstrate this more than others, but the overall trend can be observed. Second, it illustrates a national pattern where public, non-profit, and for-profit schools exist on a spectrum. Dividing the map by region mitigates the number of outliers viewers have to sift through. Lastly, the chart juxtoposes the various regions with one another. It’s easy to see at a glance that regions 9 and 7 have fewer universities than 5 or 2.
This plot was an unexpected surprise. It shows the overall pattern of completion rate and tution fees, but also the effect of in-state tuition. There isn’t a simple left-shift like what one might expect; rather, it’s as though there’s some gravitational force pulling points towards the $5,000 mark.
It’s also interesting to note that the plot is relatively unaffected past the $20,000 mark. Given the similarity in shapes between the two graphs, one can deduce that few schools beyond that point offer different in-state tuitions.
The College Scorecard is a vast repostiory of information across several years. This analysis covered data for four-year universities in the year 2014. I placed an emphasis on fifteen variables in particular, which detailed location, admissions, finances, and identification.
My focus was on investigating the completion rate of various universities. Tuition rates, university funding types, and region were found to be influential. The most significant influence came from the university funding type, as for-profit schools on average have half the completion rate of non-profit schools. Similarly, higher tuition rates seem to go hand-in-hand with higher completion rates.
There was some difficulty in importing, cleaning, and understanding the College Scorecard data. The data itself is massive; my spreadsheet program could not open the .csv because there were simply too many columns. Many of the column names were unintuitive as well (C150_4 comes to mind).
I was pleased with how questions raised by the Bivariate Plots were answered by the Multivariate Plots. Looking at the data through the lens of university funding type showed just how distinct each group is. Future work could certainly be done with regards to investigating the patterns within each funding type, and across multiple years.